tutorials/024 - Athena Query Metadata.ipynb (159 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[](https://github.com/aws/aws-sdk-pandas)\n",
"\n",
"# 24 - Athena Query Metadata\n",
"\n",
"For `wr.athena.read_sql_query()` and `wr.athena.read_sql_table()` the resulting DataFrame (or every DataFrame in the returned Iterator for chunked queries) have a `query_metadata` attribute, which brings the query result metadata returned by Boto3/Athena.\n",
"\n",
"The expected `query_metadata` format is the same returned by:\n",
"\n",
"https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_execution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Environment Variables"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"env: WR_DATABASE=default\n"
]
}
],
"source": [
"%env WR_DATABASE=default"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import awswrangler as wr"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" foo\n",
"0 1"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = wr.athena.read_sql_query(\"SELECT 1 AS foo\")\n",
"\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting statistics from query metadata"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DataScannedInBytes: 0\n",
"TotalExecutionTimeInMillis: 2311\n",
"QueryQueueTimeInMillis: 121\n",
"QueryPlanningTimeInMillis: 250\n",
"ServiceProcessingTimeInMillis: 37\n"
]
}
],
"source": [
"print(f\"DataScannedInBytes: {df.query_metadata['Statistics']['DataScannedInBytes']}\")\n",
"print(f\"TotalExecutionTimeInMillis: {df.query_metadata['Statistics']['TotalExecutionTimeInMillis']}\")\n",
"print(f\"QueryQueueTimeInMillis: {df.query_metadata['Statistics']['QueryQueueTimeInMillis']}\")\n",
"print(f\"QueryPlanningTimeInMillis: {df.query_metadata['Statistics']['QueryPlanningTimeInMillis']}\")\n",
"print(f\"ServiceProcessingTimeInMillis: {df.query_metadata['Statistics']['ServiceProcessingTimeInMillis']}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.14",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.14"
}
},
"nbformat": 4,
"nbformat_minor": 4
}